Timetable week: 15
Topic: "Presenting quantitative results"

Intro

The aim of this final week is to gain some practice in getting the various outputs we have produced as part of a statistical analysis onto “paper”, as part of the process of writing up our findings into a report.

In every quantitative analysis, we need to ensure that our analysis is open and reproducible. This is increasingly a professional standard expected of all data analysts in the social sciences. This means that we need to have an efficient way in which to share our analysis code and well as our outputs and our interpretations of our findings with our readers. This is also what the assignment requires.

RStudio has an efficient way of handling this requirement with the use of the R markdown document format, .Rmd (a more recent version of which is the Quarto document type, .qmd, but we’ll stick with .Rmd for now).

In fact, these worksheets themselves were written in R Markdown, so you have already witnessed how the output may look like: we have this text you are reading right now, and we can also include and execute R code chunks such as this one:

```{r}
# Let's first (install and) load the packages we want to use.

install.packages("tidyverse")
install.packages("mosaic")
install.packages("sjmisc")
install.packages("jtools")

library(tidyverse)
library(mosaic)
library(sjmisc)
library(jtools)
  ```

While one can do amazing things with .Rmd documents and there are various settings that one can use to customise the output in every way possible and in various output formats, we’ll only cover the very basics that allow you to complete a full data analysis project (such as your Assignment 2!!) in an .Rmd document.

Readings

Instead of readings, you have some online training to complete this week:

And/or the following:

Further training:

For those interested in developing their R skills in creating publishable-quality visualisations (graphs, figures)

Secondary readings:

  • Neuman (2014): Chapter 15 (pp. 513–546)

Exercise 1: Open and get to know an .Rmd document

About 30 minutes


  1. As usual, open the R Studio interface by clicking on the SOC2069-Statistical-analysis.Rproj file included in the SOC2069-Statistical-analysis project folder that you downloaded from Canvas in Lab6. The folder should be stored on your Newcastle University OneDrive and accessible from any computer

    If you haven’t yet downloaded the project folder in TW11 (Lab6), then download it from Canvas.

  2. Not as usual, instead of opening a new R script for the session, open a new “R Markdown” document and give it the Title “Rmd practice” in the pop-up window (you can also add your name if you want, and the current date is included automatically, but you can change or delete that if you want):

or

When you open a new R Markdown document from the Menu, the document that opens up is actually a micro-tutorial showing you the basic elements of the document format. Read through the included information and then “knit” the document in both html and Microsoft Word format to see the results:

To get a Microsoft Word document as output:

To get an HTML document as the output:

The reason why you are getting outputs (the table and the figure) is that the document uses two dataset included by default in R (one called “cars”, the other “pressure”), so you don’t need to first load the data. Of course, for your own work, you will need to include the same code we have usually used in order to load the UKHLS dataset and complete your analysis.

You can compare the contents of the .Rmd document to those in the “knitted” output. Notice that lines starting with one or more hashtags (#) are transformed into section headings. For your assignment, you can use one hashtag (#) for the main title of your report, then two hashtags (##) for subsections like Introduction, Methods, etc. Further hashtags mean further heading levels, but for a very short essay/report, you don’t need more than two levels.

Edit the R Markdown practice document

For this quick exercise, let’s use the same “cars” dataset that is included with R and whose summary is already on the sheet.The dataset contains two variables that give the speed of cars and the distances taken to stop, recorded in the 1920s. (Type ?cars in the Console to open up a Help window briefly explaining where these data are from).

  • select and delete all the text under the heading “## Including Plots”
  • create a new code chunk under “## Including Plots”
  • inside the code chunk, write the command for a simple linear regression that regresses “dist” (the outcome) on “speed” (the predictor) and saves the regression output to an object called “practice_reg
  • In the same code chunk, under the regression command, write a command to request summary statistics for the regression object; use the “summary()” command from base R for simplicity
  • you can run the commands in a code chunk to see the results by clicking on the green arrow in the upper-right corner of the chunk
  • In a new line under the output, write a brief sentence about the result: what is the relationship between a car’s speed (measured in mph) and the distance it requires to stop (measured in ft)
  • finally, “knit” the document to Microsoft Word to see the results (Important: first, close the Word document from the previous knit if it’s still open, otherwise it cannot be overwritten and the knitting process will fail)

Note that tables do not print nicely to Microsoft Word. The HTML format is much more flexible, so will notice that if you knit to both HTML and Word, the HTML document will have much nicer formatting.

With additional settings, one can obtain tables of publishable quality, but we will not cover those options in this module. The Assignment 2 template file that you can download (see below) includes some settings that make all tables print in plain text format as they usually appear in the Console. This is far from “publishable” quality, but the aim is to be able to show results that you can interpret.

Exercise 2: Complete your assignment today!

To make your life easier, you can download another “R Project” folder for the assignment from here: https://cgmoreh.github.io/SOC2069/SOC2069_Assignment2.zip

The folder contains: - the UKHLS dataset (ukhls_w8.rds) - a template .Rmd document to use for the assignment, with some useful default settings included (“my_assignment_RENAME.Rmd”) - an Example.Rmd document that shows how the analysis would look like if you were doing Exercise 3 from Lab8 as your assignment question - a Microsoft Word document containing the knitted version of the Example.

Open the “my_assignment_RENAME.Rmd” document, read through it, and begin working on your assignment, following the same steps we practiced in previous weeks. If you need inspiration, look also at the Example.

The aim is to complete the data analysis part of the assignment today, in class. You can leave the literature review and discussion for later. But if you manage finish this exercise and export your results to Microsoft Word or to HTML, you won’t even need to use R Studio outside class to complete the assignment. You can just add the additional content directly onto the Word document as you would do with any other essay.

If you output to HTML instead, it should be easy to copy the output to a Word document or another word editor, for example:

Ideally, of course, you would complete the entire analysis and writing on the .Rmd document in R Studio, including the literature, reference list, etc. and only “knit” the final version that you are ready to submit.